首页> 外文OA文献 >Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
【2h】

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

机译:具有视频字幕调整时间注意的分层LsTm

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent progress has been made in using attention based encoder-decoderframework for video captioning. However, most existing decoders apply theattention mechanism to every generated word including both visual words (e.g.,"gun" and "shooting") and non-visual words (e.g. "the", "a"). However, thesenon-visual words can be easily predicted using natural language model withoutconsidering visual signals or attention. Imposing attention mechanism onnon-visual words could mislead and decrease the overall performance of videocaptioning. To address this issue, we propose a hierarchical LSTM with adjustedtemporal attention (hLSTMat) approach for video captioning. Specifically, theproposed framework utilizes the temporal attention for selecting specificframes to predict the related words, while the adjusted temporal attention isfor deciding whether to depend on the visual information or the languagecontext information. Also, a hierarchical LSTMs is designed to simultaneouslyconsider both low-level visual information and high-level language contextinformation to support the video caption generation. To demonstrate theeffectiveness of our proposed framework, we test our method on two prevalentdatasets: MSVD and MSR-VTT, and experimental results show that our approachoutperforms the state-of-the-art methods on both two datasets.
机译:在使用基于注意力的编码器-解码器框架进行视频字幕处理方面取得了最新进展。但是,大多数现有的解码器将注意机制应用于每个生成的单词,包括视觉单词(例如“枪”和“射击”)和非视觉单词(例如“ the”,“ a”)。但是,可以使用自然语言模型轻松预测这些非视觉单词,而无需考虑视觉信号或注意力。对非视觉单词施加注意机制可能会误导并降低视频字幕的整体性能。为了解决这个问题,我们提出了一种具有调整后的时间关注(hLSTMat)方法的分层LSTM视频字幕。具体地,提出的框架利用时间关注来选择特定帧来预测相关词,而调整后的时间关注是用于确定是依赖于视觉信息还是依赖于语言上下文信息。同样,分层的LSTM被设计为同时考虑低级视觉信息和高级语言上下文信息,以支持视频字幕的生成。为了证明我们提出的框架的有效性,我们在两个流行的数据集:MSVD和MSR-VTT上测试了我们的方法,实验结果表明,我们的方法在两个数据集上都优于最新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号